Skip to content

Enable no-src GPU SDMA transfers#295

Open
nileshnegi wants to merge 2 commits into
candidatefrom
users/nileshnegi/feat/dma-memset
Open

Enable no-src GPU SDMA transfers#295
nileshnegi wants to merge 2 commits into
candidatefrom
users/nileshnegi/feat/dma-memset

Conversation

@nileshnegi
Copy link
Copy Markdown
Collaborator

Motivation

Extends EXE_GPU_DMA to accept zero-source transfers, enabling SDMA-driven memset without introducing a new executor type.

Technical Details

Fill value:

uint32_t fillVal = bit_cast<uint32_t>(MEMSET_VAL);  // 0x4B4B4B4B
count = numBytes / sizeof(uint32_t);  // count is in uint32_t units

0x4B4B4B4B matches both memset(MEMSET_CHAR) (used by dstReference[0]) and MemsetVal() used by the GFX 0-src kernel, so existing correctness validation passes without changes.

Validation:

  • srcs.size() != 1 → srcs.size() > 1; 0 sources now valid on AMD
  • 0-src + exeSubIndex != -1 results in fatal error
  • Copy-agent-selection warnings wrapped in if (!t.srcs.empty())

Resource setup:

  • hsa_amd_pointer_info on srcMem[0] guarded by !rss.srcMem.empty()

Constraints:

  • AMD only. NVIDIA builds get a fatal error for 0-src DMA transfers.
  • No engine selection as hsa_amd_memory_fill has no engine parameter/mask, so combining 0 sources with an engine subindex (e.g., n d0.2 g1) is rejected at validation.

Test Plan

Test Result

Submission Checklist

Allow EXE_GPU_DMA transfers with zero sources to perform a memset using
hsa_amd_memory_fill, which enqueues a LINEAR_FILL operation on the SDMA
engines.

Fill value:
  uint32_t fillVal = bit_cast<uint32_t>(MEMSET_VAL);  // 0x4B4B4B4B
  count = numBytes / sizeof(uint32_t);  // count is in uint32_t units

0x4B4B4B4B matches both memset(MEMSET_CHAR) (used by dstReference[0])
and MemsetVal<float>() used by the GFX no-src kernel, so existing
correctness validation passes without changes.

Validation changes (AMD only, gated on !__NVCC__):
- DMA no-src is now valid; rejected only on NVIDIA builds
- DMA no-src with a specific SDMA engine (e.g. "n d0.2 g1") is rejected
  because hsa_amd_memory_fill has no engine-selection parameter
- Copy-agent-selection warnings guarded by !t.srcs.empty() to avoid
  out-of-bounds access when no source is specified

Execution changes (ExecuteDmaTransfer):
- no-src hoisted before hipMemcpy/HSA-async-copy branches
- Copy paths (hipMemcpy and HSA async copy) unchanged

HSA resource setup:
- srcMem pointer-info query guarded by !rss.srcMem.empty()

Co-authored-by: Claude <claude@anthropic.com>
Copilot AI review requested due to automatic review settings May 10, 2026 14:26
@nileshnegi nileshnegi requested a review from a team as a code owner May 10, 2026 14:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the EXE_GPU_DMA executor to accept “0-source” transfers on AMD platforms, enabling SDMA-based memset/fill behavior without introducing a new executor type.

Changes:

  • Relax DMA transfer validation from “exactly 1 source” to “0 or 1 source”, while rejecting 0-src DMA on NVIDIA builds.
  • Add a 0-src DMA execution path that uses hsa_amd_memory_fill() to fill destinations with the existing MEMSET_VAL byte pattern.
  • Guard source-agent setup and DMA copy-agent-selection warnings so they don’t dereference srcs[0] when the transfer has no sources.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/header/TransferBench.hpp
hsa_amd_memory_fill does not record HIP events, so querying
hipEventElapsedTime after a 0-src DMA transfer produced an
"invalid resource handle" error. Guard the HIP event timing
path with !resources.srcMem.empty(); the fill path falls back
to CPU wall-clock time, which is accurate since
hsa_amd_memory_fill is synchronous.

Co-authored-by: Claude <claude@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants